Precision-Recall-Gain Curves: PR Analysis Done Right

نویسندگان

  • Peter A. Flach
  • Meelis Kull
چکیده

Precision-Recall analysis abounds in applications of binary classification where true negatives do not add value and hence should not affect assessment of the classifier’s performance. Perhaps inspired by the many advantages of receiver operating characteristic (ROC) curves and the area under such curves for accuracybased performance assessment, many researchers have taken to report PrecisionRecall (PR) curves and associated areas as performance metric. We demonstrate in this paper that this practice is fraught with difficulties, mainly because of incoherent scale assumptions – e.g., the area under a PR curve takes the arithmetic mean of precision values whereas the Fβ score applies the harmonic mean. We show how to fix this by plotting PR curves in a different coordinate system, and demonstrate that the new Precision-Recall-Gain curves inherit all key advantages of ROC curves. In particular, the area under Precision-Recall-Gain curves conveys an expected F1 score on a harmonic scale, and the convex hull of a PrecisionRecall-Gain curve allows us to calibrate the classifier’s scores so as to determine, for each operating point on the convex hull, the interval of β values for which the point optimises Fβ . We demonstrate experimentally that the area under traditional PR curves can easily favour models with lower expected F1 score than others, and so the use of Precision-Recall-Gain curves will result in better model selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R

Precision-recall (PR) and receiver operating characteristic (ROC) curves are valuable measures of classifier performance. Here, we present the R-package PRROC, which allows for computing and visualizing both PR and ROC curves. In contrast to available R-packages, PRROC allows for computing PR and ROC curves and areas under these curves for soft-labeled data using a continuous interpolation betw...

متن کامل

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachieva...

متن کامل

Precision and Recall Without Ground Truth

In this paper we present a way to use precision and recall measures in total absence of ground truth. 1 Precision and Recall 1.1 General Definitions and Notation Precision Pr and Recall Rc (and often associated F-measure or ROC curves) are standard metrics expressing the quality of Information Retrieval methods [8]. They are usually expressed with respect to a query q (or averaged over a series...

متن کامل

3D Semantic Parsing of Large-Scale Indoor Spaces Supplementary Material

The supplementary material contains: (I) most importantly, a video that shows detailed and comprehensive results, as well as a PDF which includes (II) more experimental results in terms of PR curves, sample RGB-D detection results, and comparison to the baselines, (III) a few example real-world applications of large-scale semantic parsing, (IV) implementation details of CRF to enforce contextua...

متن کامل

How Good Are My Predictions? Efficiently Approximating Precision-Recall Curves for Massive Datasets

Large scale machine learning produces massive datasets whose items are often associated with a confidence level and can thus be ranked. However, computing the precision of these resources requires human annotation, which is often prohibitively expensive and is therefore skipped. We consider the problem of cost-effectively approximating precisionrecall (PR) or ROC curves for such systems. Our no...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015